dolthub/doltgresql#1648: fix VALUES clause type inference #2187

codeaucafe · 2026-01-09T05:56:54Z

Summary

VALUES clause type inference now uses PostgreSQL's common type resolution algorithm instead of using only the first row's type.

existing issue: SELECT * FROM (VALUES(1),(2.01),(3)) v(n) failed with integer: unhandled type: decimal.Decimal

Changes

Add ResolveValuesTypes analyzer rule that:
- Collects types from all VALUES rows (not just the first)
- Uses FindCommonType() to resolve per PostgreSQL's [UNION/CASE type resolution](https://www.postgresql.org/docs/15/typeconv-union-case.html; per FindCommonType doc comment)
- Applies implicit casts to convert values to the common type
- Updates GetField expressions in parent nodes (handles aggregates like SUM)
Add UnknownCoercion expression for unknown to target type coercion
add go and bats tests

Questions for Reviewers

Analyzer rule approach: Is adding this as an analyzer rule (after TypeSanitizer) the right approach? I considered alternatives but this seemed cleanest for handling the two-pass transformation needed for GetField updates. Open to feedback if there's a better pattern.
PostgreSQL version: The code references PostgreSQL 15 docs. Should this stay as-is since doltgresql targets PG 15, or should it use /docs/current/?

Fixes: #1648

Add ResolveValuesTypes analyzer rule to compute common types across all VALUES rows, not just the first row. Previously, DoltgreSQL would incorrectly use only the first value to determine column types, causing errors when subsequent values had different types like VALUES(1),(2.01),(3). Changes: - Two-pass transformation strategy: first pass transforms VDT nodes with unified types, second pass updates GetField expressions in parent nodes - Use FindCommonType() to resolve types per PostgreSQL rules - Apply ImplicitCast for type conversions and UnknownCoercion for unknown-typed literals - Handle aggregates via getSourceSchema() - Add UnknownCoercion expression type for unknown -> target coercion without conversion Tests: - Add 4 bats integration tests for mixed int/decimal VALUES - Add 3 Go test cases covering int-first, decimal-first, SUM aggregate, and multi-column scenarios Refs: dolthub#1648

Hydrocharged · 2026-01-09T12:56:25Z

Greetings! I'll dig into this for the review.

As for the comment regarding documentation, in general the majority of the code is written to Postgres 15 specifications, however we still add features from newer versions when applicable. Chasing the newest version is a never-ending target, but we'll definitely update to a newer target relatively soon. Regarding documentation pinning though, we should never reference the current, as that changes over time. We definitely don't want critical information changing unknowingly. As long as the version is pinned for the documentation, that should be fine.

codeaucafe · 2026-01-09T18:16:29Z

Thank you. Yea I saw the code was mostly toward psql15 spec but wasn't sure if I missed any updates or other docs indicating writing towards late version spec (if much even changed from psql15 spec to later versions' specs?). Thank you for clarifying.

Also, thanks in advance for reviewing the PR.

Hydrocharged

Within my PR comments, I tend to be a bit wordy to try and help with the general direction for future PRs. For this one though, it's a good first draft! I stepped through and modified the code locally to make sure I fully understood what was happening, and the intention behind most of the changes so I could give a decent review. I ended up making some modifications to reduce the overall size of the PR as well (with a lot of the comments left here added in), which you can view here:

https://github.com/dolthub/doltgresql/pull/2193/files

Let me know once the feedback has been addressed, and I'll take another look at it.

Hydrocharged · 2026-01-12T08:53:16Z

server/analyzer/resolve_values_types.go

+				// Unknown type can be coerced to any type without explicit cast
+				// Use UnknownCoercion to report the target type while passing through values
+				newTuples[rowIdx][colIdx] = pgexprs.NewUnknownCoercion(expr, toType)


This seems like an error. The value returned by the expression must align with the reported type. If expr returns a decimal.Decimal, then the reported type must be NUMERIC, otherwise we will panic. We always make the assumption that the correct type is reported for the value (assuming we're working with DoltgresType), and this breaks that assumption, which will cause a panic (potentially far removed from the actual source of incongruity).

In addition, from debugging through the tests, this branch is only called with an expression of type unknown that converts to type text, which both happen to have the same underlying data type string.

After looking through the code a bit more, it looks like this new expression type was added due to a bad practice that we are using throughout the codebase. It is true that we can interpret the unknown type as any type (specifically through an I/O cast), and we're correctly doing that in every place that retrieves a cast by checking if the returned cast function was nil. Considering we are doing it everywhere immediately after the function call, and it's universal everywhere that we want a cast, the logic should be in the function call itself. This extends not just to implicit casts, but assignment and explicit too.

Hydrocharged · 2026-01-12T09:00:20Z

server/expression/implicit_cast.go

+// UnknownCoercion wraps an expression with unknown type to coerce it to a target type.
+// Unlike ImplicitCast, this doesn't perform any actual conversion - it just changes the
+// reported type since unknown type literals can coerce to any type in PostgreSQL.
+type UnknownCoercion struct {


This should be in a different file. We sometimes put multiple nodes, expressions, or passes in a single file, but only when they are deeply tied (something like Value and ValueParam for example). In other projects (such as go-mysql-server) we are far looser on this and made it quite hard to parse the project without the search functions of an IDE, so with Doltgres we're attempting to right some of those wrongs.

Hydrocharged · 2026-01-12T09:09:17Z

server/analyzer/resolve_values_types.go

+// This ensures VALUES(1),(2.01),(3) correctly infers numeric type, not integer.
+func ResolveValuesTypes(ctx *sql.Context, a *analyzer.Analyzer, node sql.Node, scope *plan.Scope, selector analyzer.RuleSelector, qFlags *sql.QueryFlags) (sql.Node, transform.TreeIdentity, error) {
+	// Track which VDTs we transform so we can update parent nodes
+	transformedVDTs := make(map[*plan.ValueDerivedTable]sql.Schema)


As a map value, this is never used. updateGetFieldTypes does not use the parameter at all. This is only used for the len(transformedVDTs) > 0 check below.

Hydrocharged · 2026-01-12T09:17:17Z

server/analyzer/resolve_values_types.go

+func getSourceSchema(n sql.Node) sql.Schema {
+	switch node := n.(type) {
+	case *plan.GroupBy:
+		// GroupBy's Schema() returns aggregate output, but we need the source schema
+		return getSourceSchema(node.Child)
+	case *plan.Filter:
+		return getSourceSchema(node.Child)
+	case *plan.Sort:
+		return getSourceSchema(node.Child)
+	case *plan.Limit:
+		return getSourceSchema(node.Child)
+	case *plan.Offset:
+		return getSourceSchema(node.Child)
+	case *plan.Distinct:
+		return getSourceSchema(node.Child)
+	case *plan.SubqueryAlias:
+		// SubqueryAlias wraps a VDT - get the child's schema
+		return node.Child.Schema()
+	case *plan.ValueDerivedTable:
+		return node.Schema()
+	default:
+		// For other nodes, return their schema directly
+		return n.Schema()
+	}
+}


Functions like this are a bit iffy, as they rely on us to update this function whenever we add a new node in go-mysql-server (GMS). There's no indication in GMS that we need to head to Doltgres to update anything, so this could suddenly fail for untested queries on a seemingly unrelated change. This kind of function should reside in GMS, unless we are extremely confident that we will never need to update this function in the future.

For this specific function, since we seem to be looking for either a *plan.SubqueryAlias or *plan.ValueDerivedTable node, and everything else seems to embed a UnaryNode, we could instead just search for those two, and check if it's a UnaryNode, navigating through the child if it is.

Hydrocharged · 2026-01-12T10:41:05Z

server/functions/framework/common_type.go

+		} else {
+			// candidateType can convert to typ, but not vice versa, so typ is more general
+			// Per PostgreSQL docs: "If the resolution type can be implicitly converted to the
+			// other type but not vice-versa, select the other type as the new resolution type."
+			candidateType = typ
 			if candidateType.IsPreferred {
-				candidateType = typ
+				// "Then, if the new resolution type is preferred, stop considering further inputs."
 				preferredTypeFound = true
 			}
-		} else {
-			return nil, errors.Errorf("found another preferred candidate type")
+		}
+		if preferredTypeFound {
+			break


Since we're breaking if we've found the preferred type, it makes more sense to eliminate the boolean and return once we've found the preferred.

In the old code, we continued to go through the remaining types and check for implicit casts, which could return an error early. For this code, once we've finished the loop, we should do a 2nd loop that ensures that all types have an implicit cast (skipping unknown, the same types will return an identity cast).

Hydrocharged · 2026-01-12T11:45:29Z

server/analyzer/resolve_values_types.go

+				columnTypes[colIdx][rowIdx] = pgType
+			} else {
+				// Non-DoltgresType encountered - should have been sanitized
+				// Return unchanged and let TypeSanitizer handle it


This analyzer step comes after type sanitizer, so it's an error if we still have GMS types

Hydrocharged · 2026-01-12T11:55:10Z

server/analyzer/resolve_values_types.go

+	case *plan.Distinct:
+		return getSourceSchema(node.Child)
+	case *plan.SubqueryAlias:
+		// SubqueryAlias wraps a VDT - get the child's schema


Is this always true? This seems like too general of a node for this assumption to hold true for all possible queries.

Hydrocharged · 2026-01-12T12:59:20Z

server/analyzer/resolve_values_types.go

+}
+
+// updateGetFieldExpr updates a GetField expression to use the correct type from the child schema
+func updateGetFieldExpr(expr sql.Expression, childSchema sql.Schema) (sql.Expression, bool, error) {


An easier (and more robust) way of finding a GetField expression would be to use the transform functions (such as pgtransform.NodeExprsWithOpaque

Hydrocharged · 2026-01-12T13:05:03Z

server/analyzer/resolve_values_types.go

+	if isVDT {
+		// Use WithExpressions to preserve all VDT fields (name, columns, id, cols)
+		// while updating the expressions and recalculating the schema
+		newNode, err := vdt.WithExpressions(flatExprs...)
+		if err != nil {
+			return nil, transform.NewTree, err
+		}
+		return newNode, transform.NewTree, nil
+	}
+
+	// For standalone Values node, use WithExpressions as well
+	newNode, err := values.WithExpressions(flatExprs...)
+	if err != nil {
+		return nil, transform.NewTree, err
+	}
+	return newNode, transform.NewTree, nil


Since both ValueDerivedTable and Values implements sql.Expressioner, we can simplify this with just having a single Expressioner node that we assign them to.

Hydrocharged · 2026-01-12T13:10:59Z

testing/go/values_statement_test.go

+					{Numeric("3")},
+				},
+			},
+			{


We can never have too many tests. These existing tests are fine, but there are permutations mentioned in the comments and code of resolve_values_types.go that we aren't testing here, such as GROUP BY. There's also DISTINCT, LIMIT, etc. Subqueries should be tested too since there is specific logic to handle SubqueryAlias, and that should also be checked to ensure that LIMIT, ORDER BY, etc. work within the subquery. It's possible that some of these things may not work in general (I don't think we have a test with a LIMIT within a subquery), but in those cases they should just be skipped tests so we know to fix it later (and that's assuming that other logic is the cause, if it's due to the PR then it should be fixed or the appropriate TODO should be left within the code itself explaining what's missing).

Hydrocharged · 2026-01-12T13:27:12Z

Also, regarding the first question:

Analyzer rule approach: Is adding this as an analyzer rule (after TypeSanitizer) the right approach? I considered alternatives but this seemed cleanest for handling the two-pass transformation needed for GetField updates. Open to feedback if there's a better pattern.

I didn't want to answer before reviewing to make sure that I understood everything correctly, but the approach of adding another analyzer step works perfectly for this case. In general, if we're using the transform or pgtransform packages, then it should probably be an analyzer rule.

codeaucafe · 2026-01-13T02:10:29Z

@Hydrocharged the detailed comments are fantastic (what I wish all reviews were like). I'll get to addressing the sometime this week.

Thank you!

coffeegoddd added the contribution label Jan 9, 2026

codeaucafe changed the title ~~feat(analyzer): fix VALUES clause type inference~~ dolthub/doltgresql#1648: fix VALUES clause type inference Jan 9, 2026

Hydrocharged self-requested a review January 9, 2026 12:52

codeaucafe marked this pull request as ready for review January 12, 2026 00:15

Hydrocharged requested changes Jan 12, 2026

View reviewed changes

Uh oh!

dolthub/doltgresql#1648: fix VALUES clause type inference #2187

Are you sure you want to change the base?

dolthub/doltgresql#1648: fix VALUES clause type inference #2187

Uh oh!

Conversation

codeaucafe commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Questions for Reviewers

Uh oh!

Hydrocharged commented Jan 9, 2026

Uh oh!

codeaucafe commented Jan 9, 2026

Uh oh!

Hydrocharged left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Hydrocharged commented Jan 12, 2026

Uh oh!

codeaucafe commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codeaucafe commented Jan 9, 2026 •

edited

Loading